Multi-Chain Prefetching: Exploiting Natural Memory Parallelism in Pointer-Chasing Codes

نویسندگان

  • Nicholas Kohout
  • Seungryul Choi
  • Donald Yeung
چکیده

This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-chain prefetching tolerates serialized memory latency commonly found in pointer chasing codes via aggressive prefetch scheduling. Unlike conventional prefetching techniques that hide memory latency underneath a single traversal loop or recursive function exclusively, multi-chain prefetching initiates prefetches for a chain of pointers prior to the traversal code, thus exploiting \pre-loop" work to help overlap serialized memory latency. As prefetch chains are scheduled increasingly early to accommodate long pointer chains, multi-chain prefetching overlaps prefetches across multiple independent linked structures, thus exploiting the natural memory parallelism that exists between separate pointer-chasing loops or recursive function calls. This paper makes three contributions in the context of multi-chain prefetching. First, we introduce a prefetch scheduling technique that exploits pre-loop work and inter-chain memory parallelism to tolerate serialized memory latency. To our knowledge, our scheduling algorithm is the rst of its kind to expose natural memory parallelism in pointer-chasing codes. Second, we present the design of a prefetch engine that generates a prefetch address stream at runtime, and issues prefetches according to the prefetch schedule computed by our scheduling algorithm. Finally, we conduct an experimental evaluation of multi-chain prefetching using six pointerchasing applications, and compare it against an existing technique, jump pointer prefetching. Our results show that multi-chain prefetching is an e ective latency tolerance technique for pointer-chasing applications. On the four most memory-bound applications in our suite, multichain prefetching reduces execution time between 40% and 66%, and by 2.1% and 3.6% for the other two applications, compared to no prefetching. Multi-chain prefetching also outperforms jump pointer prefetching across all six of our applications, reducing execution time between 24% and 64% for the memory-bound applications, and by 2.1% and 4.9% for the other two applications, compared to jump pointer prefetching. Finally, multi-chain prefetching achieves its performance advantages without using jump pointers; hence, it does not require the intrusive code transformations necessary to create and manage jump pointer state.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Chain Prefetching: Exploiting Memory Parallelism in Pointer-Chasing Codes

As the processor-memory performance gap continues to widen, application performance becomes increasingly limited by the memory system. Applications that employ linked data structures (LDSs) are particularly challenging from the standpoint of the memory system because of the memory serialization eeects associated with indirect memory addressing. Also known as the pointer chasing problem, such me...

متن کامل

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...

متن کامل

Asynchronous Memory Access Chaining

In-memory databases rely on pointer-intensive data structures to quickly locate data in memory. A single lookup operation in such data structures often exhibits long-latency memory stalls due to dependent pointer dereferences. Hiding the memory latency by launching additional memory accesses for other lookups is an effective way of improving performance of pointer-chasing codes (e.g., hash tabl...

متن کامل

Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables

This paper presents a novel prefetching heuristic called Delta Correlating Prediction Tables (DCPT). DCPT builds upon two previously proposed techniques, RPT prefetching by Chen and Baer and PC/DC prefetching by Nesbit and Smith. It combines the storageefficient table based design of Reference Prediction Tables (RPT) with the high performance delta correlating design of PC/DC. DCPT substantiall...

متن کامل

A Programmable Memory Hierarchy for Prefetching Linked Data Structures

Prefetching is often used to overlap memory latency with computation for array-based applications. However, prefetching for pointerintensive applications remains a challenge because of the irregular memory access pattern and pointer-chasing problem. In this paper, we use a programmable processor, a prefetch engine (PFE), at each level of the memory hierarchy to cooperatively execute instruction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000